A Translator's Workstation

نویسندگان

  • Eugenio Picchi
  • Carol Peters
  • Elisabetta Marinai
چکیده

A description is given of the present state of development of a workstation that has been designed to provide the translator with efficient and easy-to-use computational tools. The aim is to offer translators fast and flexible on-line access to existing dictionary databases and bilingual text archives and also to supply them with facilities for updating, adding to and personalizing the system data archives with their own material. 1. I N T R O D U C T I O N Over the last few years, at the Institute for Computational Linguistics in Pisa, an openended modular set of tools, known as the PiSystem, has been designed and developed to meet the various requirements of literary and linguistic text processing and analyses. The core component of the system is the DBT, a textual database management and query system that has been implemented in different configurations to perform specific text and dictionary processing tasks. Other components can be integrated with this system kernel as required, depending on the needs of a particular application. (For a detailed description of the DBT in its various configurations see Picchi, 1991.) Within this general framework, in the present paper we describe the construction of a Translator's Workstation. Translators need fast and flexible tools to assist them in the task of rendering an L1 text in L2, as fluently and faithfully as possible. They also need tools that are easy-to-use, relatively economic and wherever possible portable, as many translators are free-lancers and much translating work is done at home. These requirements have been borne in mind in the design of the Workstation. The Workstation is being constructed around two main components: a bilingual lexical database system and a system that creates and manages bilingual text archives. In addition, procedures are being provided to permit the users to update the basic system archives with their own data. At present, the system languages are Italian and English; however, the procedures are designed to be generalizable: given the necessary lexical components, they could be transported to other pairs of languages. The user can also access monolingual LDBs, and invoke Italian and English morphological programs to query the dictionary and text databases or to check inflectional paradigms. The entire system is menu-driven; the translator is guided in his use of each component by a set of menus, and context sensitive Helps can be invoked to explain the functionality of each command. 2. T H E B I L I N G U A L L E X I C A L DATABASE SYSTEM The bilingual lexical database system was first described in Picchi et al (1990); it now forms part of the MLDB, a multilingual integrated texical database system implemented within the framework of the ACQUILEX project 1 and described in detail in Marinai et al. (1990). The lexical components of the MLDB include the Italian Machine Dictionary ~ mainly based on the Zingarelli Italian Dictionary -, and LDBs derived from the Garzanti 'Nuovo Dizionario Italiano', and the Collins Concise ItalianEnglish, English-Italian Dictionary; we hope to add an English LDB shortly. 1 ACQUILEX is an ESPRIT Basic Research Action which is developing techniques and methodologies for utilising both monolingual and bilingual machinereadable dictionary sources to construct lexical components for natural language processing systems. AcrEs DE COLING-92, NANTF.S, 23-28 AOI3T 1992 9 7 2 PROC. OF COLING-92, NAN'rES, AUG. 23-28, 1992 2.1 Querying the Bilingual LDB The translator will primarily be interested in the bilingual dictionary data. Using the bilingual LDB system he can retrieve much valuable information for a given lexical item at all levels (e.g. translation equivalents, examples of usage, syntactic information, etc.) which is inaccessible using traditional dictionary lookup. The LDB query system offers dynamic search procedures that permit the user to navigate through the dictionary data and within the different fields of the entry in order to access and retrieve information in whatever part of the dictionary it is stored, specifying the language on which the query is to operate. Any lexical item or combination of items entered as a value is searched in the database with reference to its particular function in the entry and the results (i.e. number of occurrences of the item) are displayed field by field. The user can then select, view and print those results that interest him. Morphological procedures can be used in order to search the entire inflectional paradigm of a word throughout the dictionary; this is particularly useful when looking for information on the usage of a given lexical item in the example fields. A full description of the LDB query language and it complete list of all the functions implemented is given in Marinai et at. (1990). The translator can also access and query the monolingual dictionaries maintained by the system. The different perspective on the data provided by a monolingual entry often gives a more complete view of a given lexical item and its usage than is provided by the bilingual entry alone. A procedure has thus been implemented to permit semi-automatic mapping between bilingual and monolingual.LDBs. Equivalent entries from the separate dictionaries can be combined and links are created between them semi-automatically at the sense level, mainly on the basis of information that can be extracted from definitions, examples and semantic labels. In this way, we create a more complete composite entry which represents the sum of the information contained in the individual dictionaries (see Marinai et al, forthcoming). The translator can use this procedure to access, compare and scan rapidly the lexical information given for the same item in different source dictionaries. 2.2 Special izing the Bil ingual LDB In the version of the bilingual LDB that we are implementing in the Translator's Workstation, the user will also have functions available so that he can add his own information to the bilingual entry. This will be particularly useful for the translator working in a specific domain who may well accumulate information on the usage of particular terms and expressions within this discipline which is not registered in any dictionary. He can call the User Update Procedure which permits him to add to the data in the lexical entries as he wishes, as long as he respects the data representation schema. The procedure will work in interactive mode. The user calls the lexical entry to which he wishes to add information by entering the headword on the keyboard. The structured and tagged entry is displayed on the screen. The user then invokes a Help function to display the different functions that can be used to intervene on the entry. All the information added by the user is recorded in a special User Memo Section. Within this section, he is given a choice of fields in which he can enter his data. These fields are similar to those used in the rest of the Entry schema, and consist of fields for t ransla t ions , examples , t rans la t ions of examples, semantic indicators, and various kinds of semantic labels: subject, usage, geographic and register codes (for a detailed description of the data representation schema we use, see Calzolari et al., 1990). With the exception of a User Note field used for free comments by the translator, purpose-written, dynamic indexing procedures will then be executed on this new data so that it becomes directly accessible for subsequent querying. In this way, the translator is able to exploit and reuse information acquired as a result of his own experience and activity. 3. PARALLEL TEXT R E T R I E V A L The considerable attention now being given to corpus-based studies means that there is also growing interest in the creation of bilingual reference corpora. Such corpora will be important sources of information in many studies of the linguistic phenomena involved in the process of transferring information, ideas, concepts from one language to another as they can provide large quantities of documented evidence on the possible realization of a concept in two languages, according to a number of contextual factors, e.g. usage, style, register, domain, etc.. The chance to access a corpus of this type would be of enormous help to the translator in his search for that elusive 'right' translation equivalent which is so often not found in the bilingual dictionary. ACRES DE COLING-92, NANTES, 23-28 AO~f 1992 9 7 3 PROC. OF COLING-92. NANTES. AtJc;. 23-28, 1992 So far most of the systems studied to manage bilingual corpora use statistically based procedures to align the texts at the sentence level. Such programs often request the user to supply not only an SL word but also a TL candidate translation in order to construct parallel concordances. Church and Gale (1991) present a system of this type and also describe a word-based concordance tool in which the possible translations for a given word are discovered from the corpus on the basis of a pre-computed index indicating which words in one language correspond to which words in the other. Our approach to the problem is quite different. We use external evidence provided by a bilingual LDB to create links between pairs of bilingual texts on the basis of SL/TL translation equivalents. These links are then used by the bilingual text query system to construct parallel concordances for any form or cooccurrences of forms found in either of the two sets of texts. A preliminary version of this system is described in Marinai et al. (1991). At the moment, the system runs on a small sample set of Italian/English texts chosen to be representative of different language styles and thus to provide a suitable test-bed for performance evaluation and the definition of bilingual corpus design criteria. It is now our intention to extend these archives. In the version of the system which has been implemented in the Translator's Workstation, the translator has the possibility of creating a reference corpus from his own material and adding new texts to it as they become available. An easy-to-use interface has been prepared to guide the translator step-by-step as he inputs pairs of texts to the system. 3.1 Creating a Bilingual Corpus Given a new pair of bilingual texts, the first stage is to structure them in text database format using the DBT procedures. The texts are scanned to recognize and identify the different elements composing them. For example, word forms are distinguished from the other tokens, such as punctuation marks, numbers, line and paragraph breaks; codes are added to distinguish between full stops and abbreviation marks, between dashes and hyphens, between the different use of the apostrophe in Italian and in English, etc.. This stage is simple, rapid, and once a few preliminary instructions have been given, automatic. Once a pair of texts is stored in DBT format, they must be input to the text "synchronization" procedure which establishes as many links as possible between translation equivalents in the two texts. This procedure is totally automatic and operates as follows. Each word form in the text selected as the Source text is input to the morphological analyzer for that language in order to identify its base lemma which is then searched in the bilingual LDB. All translations given for this lemma are read and input to the morphological generator for the TL; all the forms generated are then searched over the relevant zone in the target text. If the procedure finds more than one possible base lemma for a given form the translations for each will be read as, in the case of grammatical homography, it is quite possible that the translation equivalent does not respect the category of the .source language and, in the case of lexical homography, it is presumed unlikely that the translations of the 'wrong' lemma will find a correspondence in the target text. A schema of the procedure is given in Figure 1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Temple Translator's Workstation Project

The Temple project has developed an open multi.lingual architecture and software support for rapid development of extensible Machine Translation functionalities. The targeted languages are those for which Natural Language Processing and human resources are scarce or difficult to obtain. The goal is to support rapid development of machine translation functionalities in a very short time with lim...

متن کامل

The Freelance Translator's Workstation: an Empirical Investigation

The notion of a translator's workstation has been widely discussed at various points in the history of translation and computers, and a number of tools and language resources have been proposed for inclusion in it, ranging from general purpose text-editing facilities, to tools designed specifically for translators, such as translation memory and terminology management software. This paper repor...

متن کامل

Multi-Purpose Development and Operation Environments for Natural Language Applications

Interactive user environments have been a central efficiencyenhancing feature of many modem computer applications, including natural language processing. There are two major classes of users for whom NLP environments can be constructed developers and end users, such as technical writers and translators. Developers need help in the various knowledge acquisition tasks, such as dictionary and gram...

متن کامل

Pangloss: A Knowledge-based Machine Assisted Translation Research Project - Site

are developing a Translator's Workstation to assist a user in the translation of newspaper articles in the area of finance (mergers and acquisitions) in one language (Spanish initially) into a second language (English). At its core is a multilingual, knowledge-based, interlingual, interactive , machine-assisted translation system consisting of a source language analysis component, an interactiv...

متن کامل

Pangloss: A Machine Translation Project

The project involves three sites (NMSU, USC, CMU) and is devoted to enhancing the state of the art in machine translation of natm'al language texts. Pangloss uses a hybrid, multi-engine approach, though knowledge-based machine translation takes a majority of resources. Types of work in the knowledge-based direction include: • continuing development of a set of knowledge acquisition tools and ut...

متن کامل

The effect of a workstation designed for Sewing, on the neck and shoulder muscles of users

Background and Aims: Sewing is one of the occupations where the prevalence of musculoskeletal disorders is high. Sewing conditions at the workstation have made them synonymous with musculoskeletal injuries in various areas of the body, particularly pain in the neck, shoulder, and upper back and even waist. The purpose of this study was to evaluate the effect of a workstation designed for tailor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992